10 research outputs found

    Analysing and Reducing Costs of Deep Learning Compiler Auto-tuning

    Get PDF
    Deep Learning (DL) is significantly impacting many industries, including automotive, retail and medicine, enabling autonomous driving, recommender systems and genomics modelling, amongst other applications. At the same time, demand for complex and fast DL models is continually growing. The most capable models tend to exhibit highest operational costs, primarily due to their large computational resource footprint and inefficient utilisation of computational resources employed by DL systems. In an attempt to tackle these problems, DL compilers and auto-tuners emerged, automating the traditionally manual task of DL model performance optimisation. While auto-tuning improves model inference speed, it is a costly process, which limits its wider adoption within DL deployment pipelines. The high operational costs associated with DL auto-tuning have multiple causes. During operation, DL auto-tuners explore large search spaces consisting of billions of tensor programs, to propose potential candidates that improve DL model inference latency. Subsequently, DL auto-tuners measure candidate performance in isolation on the target-device, which constitutes the majority of auto-tuning compute-time. Suboptimal candidate proposals, combined with their serial measurement in an isolated target-device lead to prolonged optimisation time and reduced resource availability, ultimately reducing cost-efficiency of the process. In this thesis, we investigate the reasons behind prolonged DL auto-tuning and quantify their impact on the optimisation costs, revealing directions for improved DL auto-tuner design. Based on these insights, we propose two complementary systems: Trimmer and DOPpler. Trimmer improves tensor program search efficacy by filtering out poorly performing candidates, and controls end-to-end auto-tuning using cost objectives, monitoring optimisation cost. Simultaneously, DOPpler breaks long-held assumptions about the serial candidate measurements by successfully parallelising them intra-device, with minimal penalty to optimisation quality. Through extensive experimental evaluation of both systems, we demonstrate that they significantly improve cost-efficiency of autotuning (up to 50.5%) across a plethora of tensor operators, DL models, auto-tuners and target-devices

    Towards GPU Utilization Prediction for Cloud Deep Learning

    Get PDF
    Understanding the GPU utilization of Deep Learning (DL) workloads is important for enhancing resource-efficiency and cost-benefit decision making for DL frameworks in the cloud. Current approaches to determine DL workload GPU utilization rely on online profiling within isolated GPU devices, and must be performed for every unique DL workload submission resulting in resource under-utilization and reduced service availability. In this paper, we propose a prediction engine to proactively determine the GPU utilization of heterogeneous DL workloads without the need for in-depth or isolated online profiling. We demonstrate that it is possible to predict DL workload GPU utilization via extracting information from its model computation graph. Our experiments show that the prediction engine achieves an RMSLE of 0.154, and can be exploited by DL schedulers to achieve up to 61.5% improvement to GPU cluster utilization

    Trimmer: Cost-Efficient Deep Learning Auto-tuning for Cloud Datacenters

    Get PDF
    Cloud datacenters capable of provisioning high performance Machine Learning-as-a-Service (MLaaS) at reduced resource cost is achieved via auto-tuning: automated tensor program optimization of Deep Learning models to minimize inference latency within a hardware device. However given the extensive heterogeneity of Deep Learning models, libraries, and hardware devices, performing auto-tuning within Cloud datacenters incurs a significant time, compute resource, and energy cost of which state-of-the-art auto-tuning is not designed to mitigate. In this paper we propose Trimmer, a high performance and cost-efficient Deep Learning auto-tuning framework for Cloud datacenters. Trimmer maximizes DL model performance and tensor program cost-efficiency by preempting tensor program implementations exhibiting poor optimization improvement; and applying an ML-based filtering method to replace expensive low performing tensor programs to provide greater likelihood of selecting low latency tensor programs. Through an empirical study exploring the cost of DL model optimization techniques, our analysis indicates that 26-43% of total energy is expended on measuring tensor program implementations that do not positively contribute towards auto-tuning. Experiment results show that Trimmer achieves high auto-tuning cost-efficiency across different DL models, and reduces auto-tuning energy use by 21.8-40.9% for Cloud clusters whilst achieving DL model latency equivalent to state-of-the-art techniques

    The Promise and Peril of Parallel Chat in Video Meetings for Work

    Get PDF
    We report the opportunities and challenges of parallel chat in workrelated video meetings, drawing on a study of Microsoft employees’ remote meeting experiences during the COVID-19 pandemic. We find that parallel chat allows groups to communicate flexibly without interrupting the main conversation, coordinate action around shared resources, and also improves inclusivity. On the other hand, parallel chat can also be distracting, overwhelming, and cause information asymmetries. Further, we find that whether an individual views parallel chat as a net positive in meetings is subject to the complex interactions between meeting type, personal habits, and intentional group practices. We suggest opportunities for tools and practices to capitalise on the strengths of parallel chat and mitigate its weaknesses

    Environmental Consequence of Deep Learning

    No full text
    Deep learning and artificial intelligence are often viewed as panacea technologies — ones which can decarbonise many industries. But what is the carbon cost of these systems? Damian Borowiec, Richard R. Harper and Peter Garraghan discuss

    DOPpler: Parallel Measurement Infrastructure for Auto-tuning Deep Learning Tensor Programs

    No full text
    The heterogeneity of Deep Learning models, libraries, and hardware poses an important challenge for improving model inference performance. Auto-tuners address this challenge via automatic tensor program optimization towards a target-device. However, auto-tuners incur a substantial time cost to complete given their design necessitates performing tensor program candidate measurements serially within an isolated target-device to minimize latency measurement inaccuracy. In this paper we propose DOPpler, a parallel auto-tuning measurement infrastructure. DOPpler allows for considerable auto-tuning speedup over conventional approaches whilst maintaining high-quality tensor program optimization. DOPpler accelerates the auto-tuning process by proposing a parallel execution engine to efficiently execute candidate tensor programs in parallel across the CPU-host and GPU target-device, and overcomes measurement inaccuracy by introducing a high-precision on-device measurement technique when measuring tensor program kernel latency. DOPpler is designed to automatically calculate the optimal degree of parallelism to provision fast and accurate auto-tuning for different tensor programs, auto-tuners and target-devices. Experiment results show that DOPpler reduces total auto-tuning time by 50.5% on average whilst achieving optimization gains equivalent to conventional auto-tuning infrastructure

    Meeting (the) Pandemic:Videoconferencing Fatigue and Evolving Tensions of Sociality in Enterprise Video Meetings During COVID-19

    No full text
    When COVID-19 led to mandatory working from home, significant blind spots in supporting the sociality of working life—in the moment and over time—were revealed in enterprise video meetings, and these were a key factor in reports about videoconferencing fatigue. Drawing on a large study (N = 849) of one global technology company’s employees’ experiences of all-remote video meetings during the COVID-19 pandemic, we use a dialectic method to explore the tensions expressed by employees around effectiveness and sociality, as well as their strategies to cope with these tensions. We argue that videoconferencing fatigue arose partly due to work practices and technologies designed with assumptions of steady states and taken-for-granted balances between task and social dimensions of work relationships. Our analysis offers a social lens on videoconferencing fatigue and suggests the need to reconceptualize ideas around designing technologies and practices to enable both effectiveness and sociality in the context of video meetings

    Overview of the current status of familial hypercholesterolaemia care in over 60 countries - The EAS Familial Hypercholesterolaemia Studies Collaboration (FHSC)

    No full text
    Management of familial hypercholesterolaemia (FH) may vary across different settings due to factors related to population characteristics, practice, resources and/or policies. We conducted a survey among the worldwide network of EAS FHSC Lead Investigators to provide an overview of FH status in different countries

    Overview of the current status of familial hypercholesterolaemia care in over 60 countries - The EAS Familial Hypercholesterolaemia Studies Collaboration (FHSC)

    No full text
    Background and aims: Management of familial hypercholesterolaemia (FH) may vary across different settings due to factors related to population characteristics, practice, resources and/or policies. We conducted a survey among the worldwide network of EAS FHSC Lead Investigators to provide an overview of FH status in different countries. Methods: Lead Investigators from countries formally involved in the EAS FHSC by mid-May 2018 were invited to provide a brief report on FH status in their countries, including available information, programmes, initiatives, and management. Results: 63 countries provided reports. Data on FH prevalence are lacking in most countries. Where available, data tend to align with recent estimates, suggesting a higher frequency than that traditionally considered. Low rates of FH detection are reported across all regions. National registries and education programmes to improve FH awareness/knowledge are a recognised priority, but funding is often lacking. In most countries, diagnosis primarily relies on the Dutch Lipid Clinics Network criteria. Although available in many countries, genetic testing is not widely implemented (frequent cost issues). There are only a few national official government programmes for FH. Under-treatment is an issue. FH therapy is not universally reimbursed. PCSK9-inhibitors are available in ∼2/3 countries. Lipoprotein-apheresis is offered in ∼60% countries, although access is limited. Conclusions: FH is a recognised public health concern. Management varies widely across countries, with overall suboptimal identification and under-treatment. Efforts and initiatives to improve FH knowledge and management are underway, including development of national registries, but support, particularly from health authorities, and better funding are greatly needed
    corecore